The given Incident dataset is chosen for the “Exploratory Data Analysis and Data Viualization” coursework. Major Insights are drawn from the dataset after preparing and pre-processing of the dataset. The insights are shown in the form of bar charts, pie charts, Histogram, box plot, scatter plot, correlation analysis etc.
The necessary libraries are imported in order to perform the operations on the given dataset.
The given dataset is loaded by calling read.csv function
The dataset is viewed by calling the view function and it contains overall 2834 rows and 47 columns.
The first two columns in the given dataset contains the same reference as labels, so second column is removed as it contains the same data as column 1, which does not make any difference. The number of columns and rows are checked after removing the column.
## [1] 2383 47
Converting null values in the dataset to NAs
The pie graph depicts the share of each race in crime scenes. It is noted that Black race is more involved in the crime incident with 1333 incidents, followed by Hispanic with 524, and then White with 470. The least participation in incident is Asian with just 1 incident recorded.
The below bar chart shows the count of officer’s race with respect to subject race.The white race officers have arrested most of the black race subjects and least arrested American Ind.
The Histogram below shows the number of years the officers were in force. As seen in the histogram, it is seen that most of the officers (approx 600) are new, and there are only a few officers whose service was more than 30 years.
The heat map demonstrates the subject_race who got arrested by a officer_race, it shows that the white officer have arrested around 846 black subject being the most, and the least arrested was Asian and American Ind subject.
An interactive plot is drawn, while smoothing the variation and noise in the data. The main purpose of drwaing this line was to estimate and plot a trend that gives the relationship between two variables ‘incident counts’ in particular ‘months’
The interactive plotted using plotly shows the incidents rates in the hours of a day in the year 2016. It shows as you hover the mouse over the graph. It can be seen that the hour 20:00 have the highest average incident rate of 181, whereas 07 month has the least avearge incident rate of 20.
Interactive Density plot for the incident distribution is shown. The density function shows the distribution of incidents across the density. The density is shown for the number of incidents.
The box plots shows the number of officers who got injured in their years of force. 36 officers who are in the force for more than 30 years are injured. The median of injured is seen as 7.5 years of service of officers.
The geographical map shows the exact location with lattitude and longitude of where the incidents happened along with the reason and the current situation of the subject whether arrested or accidental discharge, etc.
The bar chart shows the % of crimes by race. It is observed that black race subject is more involved in crime with 55.91%, followed by hispanic with 21.98% and least interacted is CitiRace and American Ind with only 0.04%.
By visulaizing and Analyzing the Policing data, the analysis of dataset was hard because it was having a missing values, so the dataset is cleaned and preprocessed. A lot of racial aspects is understood that black race subject have reportedly interacted in most of the crimes and got arrested by white race officers. The American Ind and Asian were least interacted in the incidents and got arrested.
References :